Storage control apparatus, processing apparatus, computer system, and storage control method

ABSTRACT

A storage control apparatus, a storage control method, a processing apparatus, and a computer system are disclosed. The storage control apparatus includes: an address detection unit, adapted to detect whether any jump of physical addresses to which sequentially arriving write access requests are mapped occurs; and a logic control unit, adapted to use a no-write allocate policy if a cache is not hit and no jump of the physical addresses to which the plurality of sequentially arriving write access requests are mapped occurs, where in the no-write allocate policy, if a quantity of continuous jumps of the physical addresses to which the plurality of sequentially arriving write access requests are mapped is less than a preset quantity, the logic control unit keeps using the no-write allocate policy, where the preset quantity is greater than 1. When the quantity of continuous jumps of the physical addresses to which the sequentially arriving write access requests are mapped is less than the preset quantity, embodiments of the present disclosure can keep using the no-write allocate policy, and avoid selecting a write allocate policy during processing of information of a low access probability. Therefore, robustness and stability of the computer system are enhanced.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to Chinese Patent Application No.201910913440.9 filed Sep. 25, 2019, which is incorporated herein in itsentirety.

TECHNICAL FIELD

The present invention relates to the processor field, and morespecifically, to a storage control apparatus, a processing apparatus, acomputer system, and a storage control method.

BACKGROUND OF THE INVENTION

To reduce a speed difference between a processor and a memory, a cache(Cache) is generally disposed between the processor and the memory. Anaccess speed of the cache is higher than that of the memory. The cacheis configured to temporarily store some data information and/orinstruction information such as program data or packet data that may berepeatedly invoked in the memory. The memory herein is generally a mainmemory (or primary memory, system memory or memory for short). Thememory is configured to store instruction information and/or datainformation that are/is indicated by a data signal, for example, storedata provided by the processor and/or implement information exchangebetween the processor and an external storage device.

The processor may access the main memory by initiating a write accessrequest, where the write access request specifies an address to whichdata needs to be written and the data that needs to be written. Afterthe processor initiates the write access request, if a data block thatneeds to be accessed is already temporarily stored in the cache, whichis referred to as a cache hit, the processor may directly access thecache without accessing the memory, thereby avoiding a long latencycaused by memory access; or if a data block that needs to be accessed isnot temporarily stored in the cache, which is referred to as a cachemiss, the processor may use a write allocate policy or a no-writeallocate policy to process the write access request.

Write allocate (WriteAllocate) policy: When a cache miss occurs, first,a read request is initiated to a lower-level memory of the cache, toinvoke a data block that is in the memory and matches the write accessrequest into the cache; and then the corresponding data block in thecache is updated based on data specified by the write access request.

No write allocate (Write-No-Allocate) policy: When a cache miss occurs,a write request is directly initiated to the memory, to update acorresponding data block in the memory based on the data specified bythe write access request, without modifying the cache.

In comparison with the write allocate policy, the no-write allocatepolicy takes a shorter time, and is more suitable for processing datathat will not be accessed again within a short time. In the writeallocate policy, the data specified by the write access request may bestored in the cache. Therefore, in comparison with the no-write allocatepolicy, the write allocate policy is more suitable for processing datathat may be repeatedly accessed.

In a conventional solution, if it is detected that continuous writeaccess requests are mapped to continuous physical addresses (that is, nojump occurs), the no-write allocate policy is used to respond to thewrite access requests. If it is detected that any jump of physicaladdresses to which continuous write access requests are mapped occurs,the processor directly quits a no-write allocate operation, and switchesto a write allocate operation.

However, for continuous write access requests, some data that may not berepeatedly accessed may not necessarily correspond to continuousphysical addresses. For example, in a memory copy operation process orthe like, the processor may need to jump to other addresses at regularintervals to perform memory move operations or the like; in someprocesses, data blocks that need to be continuously accessed may havecontinuous virtual addresses and discontinuous physical addresses. Inthe conventional solution, a write allocate operation may be used insuch cases of physical address jumps. Therefore, efficiency of theprocessor is reduced, and performance of the processor is reduced.

SUMMARY OF THE INVENTION

In view of this, embodiments of the present invention provide a storagecontrol apparatus, a processing apparatus, a computer system, and astorage control method that have higher robustness, to resolve theforegoing problem.

To achieve this objective, according to a first aspect, the presentinvention provides a storage control apparatus, including: an addressdetection unit, adapted to detect whether any jump of physical addressesto which sequentially arriving write access requests are mapped occurs;and a logic control unit, coupled to the address detection unit, andadapted to use a no-write allocate policy to process the write accessrequests if a cache is not hit and no jump of the physical addresses towhich the plurality of sequentially arriving write access requests aremapped occurs, where in the no-write allocate policy, if a quantity ofcontinuous jumps of the physical addresses to which the plurality ofsequentially arriving write access requests are mapped is less than apreset quantity, the logic control unit keeps using the no-writeallocate policy, where the preset quantity is greater than 1.

In some embodiments, the write access request includes: a physicaladdress to which a storage instruction is mapped; and written dataspecified by the storage instruction, where in the no-write allocatepolicy, the written data is written to a memory and is not written tothe cache.

In some embodiments, the logic control unit is adapted to: in theno-write allocate policy, if the quantity of continuous jumps of thephysical addresses to which the plurality of sequentially arriving writeaccess requests are mapped is greater than or equal to the presetquantity, use, by the logic control unit, a write allocate policy, wherein the write allocate policy, the written data is written to the cache.

In some embodiments, the logic control unit is adapted to: in an initialstate, select to use the write allocate policy, and if no jump of thephysical addresses to which the sequentially arriving write accessrequests are mapped occurs, exit, by the logic control unit, the initialstate, and enter a primary screening state; and in the primary screeningstate, use the no-write allocate policy, and if the quantity ofcontinuous jumps of the physical addresses to which the plurality ofsequentially arriving write access requests are mapped is equal to thepreset quantity, return to the initial state.

In some embodiments, the logic control unit is further adapted to: inthe primary screening state, if no jump of the physical addresses towhich the sequentially arriving write access requests are mapped occurs,enter a level 1 caching state; and in the level 1 caching state, if anyjump of the physical addresses to which the sequentially arriving writeaccess requests are mapped occurs, return to the primary screeningstate.

In some embodiments, the logic control unit is further adapted to: inthe level 1 caching state to a level K−1 caching state, if no jump ofthe physical addresses to which the plurality of sequentially arrivingwrite access requests are mapped occurs, transition to a lower-levelcaching state; and in a level 2 caching state to a level K cachingstate, if any jump of the physical addresses to which the sequentiallyarriving write access requests are mapped occurs, transition to anupper-level caching state, where K is a natural number greater than orequal to 2.

In some embodiments, the storage control apparatus further includes aregister configured to store a cache depth value, where the logiccontrol unit is further adapted to: in an initial phase, use the writeallocate policy, and reset the cache depth value to an initial value; ifthe sequentially arriving write access requests are sequentially mappedto continuous physical addresses, increase the cache depth value basedon a first preset gradient; if any jump of the physical addresses towhich the sequentially arriving write access requests are mapped occurs,decrease the cache depth value based on a second preset gradient; and ifany jump of the physical addresses to which the sequentially arrivingwrite access requests are mapped occurs, decrease the cache depth valuebased on a second preset gradient; and when the cache depth value isless than a specified threshold, select to use the write allocatepolicy, or when the cache depth value is greater than or equal to thespecified threshold, select to use the no-write allocate policy.

In some embodiments, the specified threshold is greater than or equal toa sum of the initial value and the first preset gradient.

In some embodiments, the write access request further includes writepolicy information, where the write policy information indicates one ofthe write allocate policy and the no-write allocate policy; and thelogic control unit is configured to perform the following: screening thewrite policy information of the write access request, to select to usethe no-write allocate policy; or using the write allocate policy basedon the write policy information of the write access request.

In some embodiments, the storage control apparatus further includes: aread cache unit, adapted to initiate a read request to the memory in thewrite allocate policy, and store a data block returned by the memory inthe cache, so that the data block is modified based on the written data.

In some embodiments, the storage control apparatus further includes: awrite cache unit, adapted to initiate a write request to the memory inthe no-write allocate policy, so that a corresponding data block in thememory is modified based on the written data.

In some embodiments, the preset quantity is set to a fixed value, or isdetermined based on a quantity of times that a memory access function isinvoked, where the memory access function is implemented by at least oneof the write access requests.

According to a second aspect, an embodiment of the present disclosureprovides a processing apparatus, where the processing apparatus is aprocessor, a processor core, or a system on chip, and includes any oneof the foregoing storage control apparatuses.

In some embodiments, the processing apparatus further includes aninstruction execution unit, adapted to provide the write access requestbased on the storage instruction; and a hardware register, adapted toprovide the write policy information.

In some embodiments, the processing apparatus further includes a memorymanagement unit, coupled to the register, and adapted to provide anentry that matches a virtual address specified by the storageinstruction, to translate the virtual address based on the entry intothe physical address to which the storage instruction is mapped andprovide the write policy information to the instruction execution unit.

In some embodiments, the hardware register is a global register.

According to a third aspect, an embodiment of the present disclosurefurther provides a storage control method, including: detecting whetherany jump of physical addresses to which sequentially arriving writeaccess requests are mapped occurs; and using a no-write allocate policyto process the write access requests if a cache is not hit and no jumpof the physical addresses to which the plurality of sequentiallyarriving write access requests are mapped occurs, where in the no-writeallocate policy, if a quantity of continuous jumps of the physicaladdresses to which the plurality of sequentially arriving write accessrequests are mapped is less than a preset quantity, keeping using theno-write allocate policy, where the preset quantity is greater than 1.

In some embodiments, the write access request includes: a physicaladdress to which a storage instruction is mapped; and written dataspecified by the storage instruction, where in the no-write allocatepolicy, the written data is written to a memory and is not written tothe cache.

In some embodiments, the storage control method further includes: in theno-write allocate policy, if the quantity of continuous jumps of thephysical addresses to which the plurality of sequentially arriving writeaccess requests are mapped is greater than or equal to the presetquantity, using, by the logic control unit, a write allocate policy,where in the write allocate policy, the written data is written to thecache.

In some embodiments, the storage control logic includes: in an initialstate, selecting to use the write allocate policy, and if no jump of thephysical addresses to which the plurality of sequentially arriving writeaccess requests are mapped occurs, exiting, by the logic control unit,the initial state, and entering a primary screening state; and in theprimary screening state, using the no-write allocate policy, and if thequantity of continuous jumps of the physical addresses to which theplurality of sequentially arriving write access requests are mapped isequal to the preset quantity, returning to the initial state.

In some embodiments, the storage control logic further includes: in theprimary screening state, if no jump of the physical addresses to whichthe plurality of sequentially arriving write access requests are mappedoccurs, entering a level 1 caching state; and in the level 1 cachingstate, if any jump of the physical addresses to which the sequentiallyarriving write access requests are mapped occurs, returning to theprimary screening state.

In some embodiments, the storage control logic further includes: in thelevel 1 caching state to a level K−1 caching state, if no jump of thephysical addresses to which the plurality of sequentially arriving writeaccess requests are mapped occurs, transitioning to a lower-levelcaching state; and in a level 2 caching state to a level K cachingstate, if any jump of the physical addresses to which the sequentiallyarriving write access requests are mapped occurs, transitioning to anupper-level caching state, where K is a natural number greater than orequal to 2.

In some embodiments, the storage control logic includes: in an initialphase, using the write allocate policy, and resetting the cache depthvalue to an initial value; if no jump of the physical addresses to whichthe sequentially arriving write access requests are mapped occurs,increasing the cache depth value based on a first preset gradient; ifany jump of the physical addresses to which the sequentially arrivingwrite access requests are mapped occurs, decreasing the cache depthvalue based on a second preset gradient; and when the cache depth valueis less than a specified threshold, selecting to use the write allocatepolicy, or when the cache depth value is greater than or equal to thespecified threshold, selecting to use the no-write allocate policy.

In some embodiments, the specified threshold is greater than or equal toa sum of the initial value and the first preset gradient.

In some embodiments, the write access request further includes writepolicy information, where the write policy information indicates one ofthe write allocate policy and the no-write allocate policy; and byscreening the write policy information of the write access request, thestorage control logic selects to use the no-write allocate policy, oruse the write allocate policy based on the write policy information ofthe write access request.

In some embodiments, the storage control method further includes:obtaining an entry that matches a virtual address specified by thestorage instruction; translating, based on an identifier of the entryand the data, the virtual address specified by the storage instructioninto the physical address to which the storage instruction is mapped;and providing the write policy information based on an attribute flag ofthe entry.

In some embodiments, the write policy information is provided by aglobal register.

In some embodiments, the storage control logic further includes:initiating a read request to the memory in the write allocate policy,and storing a data block returned by the memory in the cache, so thatthe data block is modified based on the written data.

In some embodiments, the storage control logic further includes:initiating a read request to the memory in the no-write allocate policy,so that a corresponding data block in the memory is modified based onthe written data.

In some embodiments, the preset quantity is set to a fixed value, or isdetermined based on a quantity of times that a memory access function isinvoked, where the memory access function is implemented by at least oneof the write access requests.

According to a fourth aspect, an embodiment of the present disclosurefurther provides a computer system, including: any one of the foregoingprocessing apparatus; a cache, coupled to the storage control apparatus;and a memory, coupled to the storage control apparatus.

In some embodiments, the computer system is implemented by asystem-on-a-chip.

In comparison with a conventional solution, the storage control method,storage control apparatus, processing apparatus, and computer systemprovided by the embodiments of the present disclosure can detect whethera plurality of jumps of the physical addresses to which the sequentiallyarriving write access requests are mapped occur, and if a quantity ofcontinuous jumps of the physical addresses to which the plurality ofsequentially arriving write access requests are mapped is less than thepreset quantity, keep using the no-write allocate policy, instead ofstoring the written data specified by the write access requests in thecache and/or the memory, to avoid, as much as possible, completingstorage of the written data by selecting the write allocate policyduring processing of information of a low access probability, avoidstoring written data of a low access probability in the cache, improveperformance and efficiency of the computer system, and enhancerobustness and stability of the processor and the computer system.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other objectives, features, and advantages of the presentinvention will become more apparent by describing the embodiments of thepresent invention with reference to the following accompanying drawings.In the drawing,

FIG. 1 illustrates a schematic block diagram of a computer systemaccording to an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a processor according to anembodiment of the present invention;

FIG. 3 illustrates a partial schematic flowchart for executing a storageinstruction according to an embodiment of the present invention;

FIG. 4 illustrates a schematic diagram of a storage control unitaccording to an embodiment of the present invention;

FIG. 5a to FIG. 5c respectively illustrate schematic state transitiondiagrams of a storage control logic according to an embodiment of thepresent invention; and

FIG. 6 illustrates a schematic flowchart of a storage control methodaccording to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The following describes the present invention based on embodiments. Thepresent invention, however, is not limited to these embodiments. Thefollowing description of the present invention gives some specificdetails. Without the description of such details, the present inventioncan still be fully understood by those skilled in the art. To avoidconfusing the essence of the present invention, well-known methods,processes and procedures are not described in detail. In addition, theaccompanying drawings are not necessarily drawn to scale.

The following terms are used in this specification.

Computer system: It is a general embedded system, a desktop computer, aserver, a system on chip, or another system having an informationprocessing capability.

Memory: It is a physical structure located within the computer systemand used for storing information. By purpose, memories can becategorized into a main memory (also referred to an internal memory, orsimply referred to as a memory/main memory) and a secondary memory (alsoreferred to as an external memory, or simply referred to as a secondarymemory/external memory). The main memory is used for storing instructioninformation and/or data information represented by data signals, forexample, used for storing data provided by a processor, or may be usedfor information exchange between the processor and the external memory.Information provided by the external memory needs to be transferred tothe main memory before being accessible by the processor. Therefore, amemory mentioned herein is generally a main memory, and a storage devicementioned herein is generally an external memory.

Physical address (Physical Address, PA for short): It is an address onan address bus. The processor or other hardware may provide a physicaladdress to the address bus to access the main memory. The physicaladdress may also be referred to as an actual address, a real address, oran absolute address. “Continuous physical addresses” in thisspecification may correspond to binary codes whose numeric values arecontinuous. However, the embodiments of the present disclosure are notlimited thereto. In some specific designs, alternatively, “continuousphysical addresses” may correspond to binary codes in which only one bitis different.

Virtual address: It is an abstract address used by software or aprogram. A virtual address space may be larger than a physical addressspace, and a virtual address may be mapped to a corresponding physicaladdress.

Paging management mechanism: The virtual address space is divided into aplurality of parts, where each part is used as a virtual page (page). Inaddition, the physical address space is divided into a plurality ofparts, where each part is used as a physical page (pageframe). Thephysical page is also referred to as a physical block or a physicalpageframe.

Page table: It is used to specify a correspondence between a virtualpage and a physical page, and generally stored in the main memory. Thepage table includes a plurality of entries, where each entry is used tospecify a mapping relationship between a virtual page and a physicalpage and some management flags, so that a virtual address in a virtualpage can be translated into a physical address in a correspondingphysical page. Some entries included in the page table may betemporarily stored in a register outside the main memory, so that theentries are invoked in an address translation process.

Write allocate policy: When a cache miss occurs, first, a read requestis initiated to a lower-level memory of a cache, to invoke a data blockthat is in the memory and matches a write access request into the cache;and then the corresponding data block in the cache is updated based ondata specified by the write access request. A write allocate operationmay also be referred to as a FetchOnWrite (FetchOnWrite) operation.

Non write allocate policy: When a cache miss occurs, a write request isdirectly initiated to the memory, to update a corresponding data blockin the memory based on the data specified by the write access request,without modifying the cache.

Cache hit and miss: When the processor needs to access information inthe main memory, first, the cache may be searched for the requiredinformation. If the cache already stores the required information, whichis referred to as a cache hit or a hit, no search needs to be performedon the main memory. If the cache does not store the requiredinformation, it is referred to as a cache miss, or may be referred to asa miss, or a failure.

Memory access function: It is a function that needs to access thememory. For example, it is a memory copy function or a memoryinitialization function.

Memory copy operation: It is an operation implemented by a memory copyfunction (memcpy) in languages C and C++, and is used to copy severalpieces of data (each piece of data is, for example, one byte in length)from a source main memory address to a destination main memory address.Each memory copy operation may be implemented by invoking the memorycopy function at least once.

Memory initialization operation: It is an operation implemented by usinga memory initialization function (memset) in the languages C and C++,and is used to set all specified content of the main memory to specifiedvalues. Each memory initialization operation may be implemented byinvoking the memory initialization function at least once.

Operations such as memory copy or memory initialization generally needto write a series of information (instruction information and/or datainformation) to the main memory. The information is massive, but is notaccessed frequently. Therefore, it is expected that the informationshould not be written to the cache, to save time and avoid occupancy ofthe cache by information of a low access probability.

When performing an operation such as memory copy or memoryinitialization that needs to update the main memory, the processorsequentially initiates a plurality of write access requests to modifycorresponding information in the main memory. Each write access requestspecifies a physical address that needs to be accessed and data thatneeds to be written, and may further specify some auxiliary information,where the auxiliary information may include write allocate informationused to indicate whether the write allocate policy is valid. “Continuouswrite access requests” or “sequentially arriving write access requests”mentioned in this specification are write access requests that areinitiated continuously in sequence. This series of write access requestswill be processed sequentially.

However, in some cases, sequentially arriving write access requests aremapped to discontinuous physical addresses. For example, when a memorycopy operation is being performed, jumps of physical addresses that theprocessor needs to access may occur at intervals, that is, jumps ofphysical addresses specified by a series of write access requestsinitiated by the processor may occur. For another example,program-oriented virtual pages are continuous, but the continuousvirtual pages may be mapped to discontinuous physical pages;consequentially, continuous virtual addresses are mapped todiscontinuous physical addresses.

In a conventional solution, if it is detected that continuous writeaccess requests are mapped to continuous physical addresses (that is, nojump occurs), the no-write allocate policy is used to respond to thewrite access requests, if it is detected that any jump of physicaladdresses to which continuous write access requests are mapped occurs,the processor directly quits the no-write allocate policy, and switchesto the write allocate policy. Based on the foregoing analysis, in amemory copy operation process or the like, a case in which continuouslyarriving write access requests are mapped to discontinuous physicaladdresses (that is, a physical address jump occurs) may occur.Therefore, in the conventional solution, unnecessary switching betweenthe write allocate policy and the no-write allocate policy occurs for aplurality of times during a memory copy operation or the like.Therefore, performance and efficiency of the processor are reduced.

Based on this, an embodiment of this application provides a storagecontrol solution. The storage control solution is used to dynamicallymake a selection and perform switching between a write allocate policyand a no-write allocate policy based on physical addresses to whichsequentially arriving write access requests are mapped, and when aquantity of continuous jumps of physical addresses to which theplurality of sequentially arriving write access requests are mapped isless than a preset quantity, can still keep using the no-write allocatepolicy, instead of storing written data specified by the write accessrequests in a cache and/or a memory, thereby capable of restoration fromthe no-write allocate policy to the write allocate policy only when thequantity of continuous jumps of the physical addresses to which theplurality of sequentially arriving write access requests are mapped isgreater than or equal to the preset quantity. Therefore, processing ofthe write access requests is optimized, and robustness and efficiency ofa processor are improved.

The following describes the embodiments of the present inventionexemplarily by using an application scenario of a memory copy operation,but the embodiments of the present invention are not limited thereto.Based on enlightenments of the embodiments of the present invention,this application may be further applied to other operations, forexample, a memory initialization operation implemented by a memoryinitialization function or the like or by an instruction, or anotherrelated operation that may be oriented to discontinuous physicaladdresses and oriented to information of a low access probability.

System Overview

FIG. 1 illustrates a schematic block diagram of a computer systemaccording to an embodiment of the present invention. The computer system10 is an example of a “central” system architecture. The computer system10 may be constructed based on processors of various models in a currentmarket, and driven by an operating system such as a WINDOWS™ operatingsystem, a UNIX operating system, or a Linux operating system. Inaddition, the computer system 10 may be implemented in hardware and/orsoftware such as a PC computer, a desktop computer, laptop computer, anotebook, tablet, a server, smart phone and a mobile communicationsapparatus.

As shown in FIG. 1, the computer system 10 in this embodiment of thepresent invention may include one or more processors 12 and a memory 14.The memory 14 in the computer system 10 may be a main memory (which mayalso be referred to as system memory, primary memory or memory). Thememory is configured to store instruction information and/or datainformation, for example, provided by the processor 12, or may beconfigured to implement data exchange between the processor 12 and anexternal storage device 16 (which may also be referred to as backingstore, store, secondary memory or external memory).

In some cases, the processor 12 may need to access the memory 14 toobtain data in the memory 14 or modify data in the memory 14. Because anaccess speed of the memory 14 is low, to reduce a speed differencebetween the processor 12 and the memory 14, the computer system 10further includes a cache 18 coupled to a bus 11, where the cache 18 isconfigured to temporarily store instructions and/or data such as programdata or packet data that may be repeatedly invoked in the memory 14. Thecache 18 is implemented by a type of storage apparatus, for example, astatic random access memory (SRAM). The cache 18 may be a multi-levelstructure, for example, a three-level cache structure having a level 1cache (L1 Cache), a level 2 cache (L2 Cache), and a level 3 cache (L3Cache), or may be a cache structure of more than three levels or anothertype of cache structure. In some embodiments, a part of the cache 18(for example, the level 1 cache, or the level 1 cache and the level 2cache) may be integrated in the processor 12, or integrated with theprocessor 12 in a system-on-a-chip (SOC).

Based on this, the processor 12 may include parts such as an instructionexecution unit 121, a memory management unit 122, and a storage controlunit 123. When executing some instructions that need to modify thememory, the instruction execution unit 121 initiates a write accessrequest, where the write access request specifies written data thatneeds to be written to the memory and a corresponding address. Thememory management unit 122 is configured to translate a virtual addressspecified by the instructions into a physical address to which thevirtual address is mapped. The storage control unit 123 is configured toperform a write allocate operation (which may also be referred to as afetch on write operation) or a no-write allocate operation (which mayalso be referred to as a non-write allocate operation, write-no-allocateoperation or write around operation) to store the written data in astorage location pointed to by the physical address to which the writeaccess request is mapped.

Information exchange between the memory 14 and the cache 18 is generallyorganized based on blocks. In some embodiments, the cache 18 and thememory 14 may be divided into data blocks based on a same spacedimension. A data block may be used as a smallest unit (including one ormore pieces of data of a preset length) between the cache 18 and thememory 14. For brief and clear expression, each data block in the cache18 is hereinafter referred to as a cache block for short (which may alsobe referred to as a cacheline or a cache line), and different cacheblocks have different cache block addresses; each data block in thememory 14 is referred to as a memory block for short, and differentmemory blocks have different memory block addresses. A cache blockaddress includes, for example, a physical address label used to locate adata block of the corresponding cache block.

Due to a space limitation and a resource limitation, the cache 18 cannottemporarily store all content in the memory 14. To be specific, astorage capacity of the cache 18 is generally smaller than that of thememory 14, and cache block addresses provided by the cache 18 cannotcorrespond to all memory block addresses provided by the memory 14. Whenthe processor 12 needs to access the memory, the processor 12 firstaccesses the cache 18 by using the bus 11, to determine whether contentto be accessed is already stored in the cache 18. When the content to beaccessed is in the cache 18 the access is referred to as a cache hit,and the processor 12 directly invokes the content to be accessed fromthe cache 18. When the content to be accessed by the processor 12 is notin the cache 18, the access is referred to as a cache miss, and theprocessor 12 needs to access the memory 14 by using the bus 11, to findcorresponding information in the memory 14. Because an access speed ofthe cache 18 is very high, when the cache 18 is hit, efficiency of theprocessor 12 can be improved significantly, and further, performance andefficiency of the entire computer system 10 are also improved.

In addition, the computer system 10 may further include a storage device16, a display device 13, an audio device 14, and an input/output devicesuch as a mouse or keyboard 15. The storage device 16 is, for example, adevice used for information access, such as a hard disk coupled to thebus 11 by using a corresponding interface, an optical disc, and a flashmemory. The display device 13 is, for example, coupled to the bus 11 byusing a corresponding video card, and configured to perform displayingbased on a display signal provided by the bus 11.

The computer system 10 generally further includes a communicationsdevice 17, and therefore can communicate with a network or anotherdevice by various means. The communications device 17 may include, forexample, one or more communications modules. For example, thecommunications device 17 may include a wireless communications moduleapplicable to a specific wireless communications protocol. For example,the communications device 17 may include a WLAN module, configured toimplement Wi-Fi communication in compliance with the 802.11 standarddefined by the Institute of Electrical and Electronics Engineers (IEE).The communications device 17 may also include a WWAN module, configuredto implement wireless wide area network communication in compliance withthe cellular protocol or other wireless wide area network protocols. Thecommunications device 17 may further include a communications moduleusing any other protocol, for example, a Bluetooth module, or anothercustomized communications module. The communications device 17 may alsobe a port used for serial transmission of data.

Certainly, structures of different computer systems may also varydepending on different mother boards, operating systems, and instructionset architectures. For example, currently, many computer systems areequipped with an input/output control center connected between the bus11 and each input/output device, and the input/output control center maybe integrated in the processor 12 or is independent of the processor 12.

As described hereinafter, the storage control unit 123 in the computersystem 10 in this embodiment of the present disclosure detects whether aplurality of jumps of physical addresses to which sequentially arrivingwrite access requests are mapped occur, and selects one of a writeallocate operation and a no-write allocate operation to store writtendata specified by the write access requests in the cache 18 and/or thememory 14, to avoid, as much as possible, completing storage of thewritten data by selecting the write allocate operation during processingof information of a low access probability, avoid storing written dataof a low access probability in the cache, and improve performance andefficiency of the computer system.

Processor

FIG. 2 is a schematic block diagram of the processor 12 according to anembodiment of the present invention. In some embodiments, each processor12 may include one or more processor cores 120 configured to processinstructions. Processing and execution of the instructions may becontrolled by a user (for example, by using an application program)and/or a system platform. In some embodiments, each processor core 120may be configured to process a specific instruction set. In someembodiments, the instruction set may support complex instruction setcomputing (CISC), reduced instruction set computing (RISC), or very longinstruction word (VLIW)-based computing. Different processor cores 120may process different instruction sets or a same instruction set. Insome embodiments, the processor core 120 may further include otherprocessing modules, for example, a digital signal processor (DSP). As anexample, FIG. 2 illustrates processor cores 1 to m, where m is anon-zero natural number.

In some embodiments, the cache 18 shown in FIG. 1 may be completely orpartly integrated in the processor 12. In addition, based on differentarchitectures, the cache 18 may be a single internal cache ormulti-level caches located in and/or outside each processor core 120(for example, three-level caches L1 to L3 shown in FIG. 2, all of whichare identified as 18 in FIG. 2), or may include an instruction cache anda data cache. In some embodiments, each component of the processor 12may share at least one part of the cache. For example, as shown in FIG.2, the processor cores 1 to m share the level 3 cache L3. The processor12 may further include an external cache (not illustrated).Alternatively, another cache structure may be used as an external cacheof the processor 12.

In some embodiments, as shown in FIG. 2, the processor 12 may include aregister file 126 (RegisterFile). The register file 126 may include aplurality of registers configured to store different types of dataand/or instructions, and the registers may be of different types. Forexample, the register file 126 may include an integer register, afloating-point register, a status register, an instruction register, anda pointer register. The registers in the register file 126 may beimplemented by using general registers, or may be particularly designedbased on an actual requirement of the processor 12.

The processor 12 may include the memory management unit (MMU) 122,configured to translate a virtual address into a physical address. Someentries in a page table are temporarily stored in the memory managementunit 122. The memory management unit 122 may also obtain, from thememory, entries that are not temporarily stored. One or more memorymanagement units 122 may be disposed in each processor core 120. Memorymanagement units 122 in different processor cores 120 may also implementsynchronization with memory management units 120 located in otherprocessors or processor cores, so that each processor or processor corecan share a unified virtual storage system.

The processor 12 is configured to execute an instruction sequence (thatis, a program). A process of executing each instruction by the processor12 includes steps of fetching an instruction from the memory that storesthe instruction, decoding the fetched instruction, executing the decodedinstruction, saving an instruction execution result, and the like. Thiscycle is repeated until all instructions in an instruction set areexecuted or a shutdown instruction is encountered.

To implement the foregoing process, the processor 12 may include aninstruction fetch unit 124, an instruction decoding unit 125, aninstruction transmission unit (not shown), an instruction execution unit121, an instruction retirement unit (not shown), and the like.

The instruction fetch unit 124, as a start engine of the processor 12,is configured to move an instruction from the memory 14 to aninstruction register (which may be a register for storing aninstruction, in the register file 26 shown in FIG. 2), and receive anext instruction fetch address or obtain a next instruction fetchaddress through a calculation based on an instruction fetch algorithm,where the instruction fetch algorithm includes, for example, increasingor decreasing addresses based on a length of an instruction.

After fetching an instruction, the processor 12 enters an instructiondecoding phase. The instruction decoding unit 125 decodes the fetchedinstruction based on a predetermined instruction format, to obtainoperand obtaining information required by the fetched instruction, toprepare for an operation of the instruction execution unit 121. Theoperand obtaining information points to, for example, an immediate, aregister, or other software or hardware that can provide a sourceoperand.

The instruction transmission unit generally exists in thehigh-performance processor 12, and is located between the instructiondecoding unit 125 and the instruction execution unit, and configured toschedule and control an instruction, to allocate each instruction todifferent instruction execution units 121 efficiently, so that paralleloperations of a plurality of instructions become possible. After theinstruction is fetched, decoded, and scheduled to a correspondinginstruction execution unit 121, the corresponding instruction executionunit 121 starts to execute the instruction, that is, perform anoperation indicated by the instruction, and implement a correspondingfunction.

The instruction retirement unit (or referred to as an instructionwrite-back unit) is mainly responsible for writing back an executionresult generated by the instruction execution unit 121 to acorresponding storage location (for example, an internal register of theprocessor 12), so that the corresponding execution result can be quicklyobtained from the storage location by using a subsequent instruction.

For instructions of different types, different instruction executionunits 121 may be correspondingly disposed in the processor 12. Theinstruction execution unit 121 may be an operation unit (for example,include an arithmetic logic unit or a vector operation unit, andconfigured to perform an operation based on an operand and output anoperation result), a memory execution unit (for example, configured toaccess the memory based on an instruction to read data in the memory orwrite specified data to the memory), a coprocessor, or the like. In theprocessor 12, each instruction execution unit 121 may run in paralleland output a corresponding execution result.

When executing a type of instruction (for example, a memory accessinstruction), the instruction execution unit 121 needs to access thememory 14, to obtain information stored in the memory 14 or provide datathat needs to be written to the memory 14.

It should be noted that, the instruction execution unit 121 configuredto execute the memory access instruction may also be referred to as amemory execution unit for short. The memory execution unit is, forexample, a load store unit (LSU) and/or another unit used for memoryaccess.

After the memory access instruction is obtained by the instruction fetchunit 124, the instruction decoding unit 125 may decode the memory accessinstruction, so that a source operand of the memory access instructioncan be obtained. The decoded memory access instruction is provided tothe corresponding instruction execution unit 121, and the instructionexecution unit 121 may perform a corresponding operation (for example,the arithmetic logic unit performs an operation on the source operandstored in the register) on the source operand of the memory accessinstruction to obtain address information corresponding to the memoryaccess instruction, and initiate a corresponding request based on theaddress information, for example, an address translation request, or awrite access request.

The source operand of the memory access instruction generally includesan address operand. The instruction execution unit 121 performs anoperation on the address operand to obtain a virtual address or physicaladdress corresponding to the memory access instruction. When the memorymanagement unit 122 is disabled, the instruction execution unit 121 mayobtain the physical address of the memory access instruction directlythrough a logic operation. When the memory management unit 122 isenabled, the corresponding instruction execution unit 121 initiates anaddress translation request based on the virtual address correspondingto the memory access instruction, where the address translation requestincludes the virtual address corresponding to the address operand of thememory access instruction; and the memory management unit 122 respondsto the address translation request, and translates the virtual addressin the address translation request into the physical address based on anentry that matches the virtual address, so that the instructionexecution unit 121 can access the cache 18 and/or the memory 14 based onthe translated physical address.

The memory access instruction may include a load instruction and astorage instruction based on different functions. In a process ofexecuting the load instruction, information in the memory 14 or thecache 18 generally does not need to be modified, and the instructionexecution unit 121 only needs to read, based on an address operand ofthe load instruction, data stored in the memory 14, the cache 18, or anexternal storage device.

Different from that of the load instruction, a source operand of thestorage instruction includes not only an address operand, but also datainformation. In a process of executing the storage instruction, thememory 14 and/or the cache 18 generally need/needs to be modified. Thedata information of the storage instruction may point to written data,where a source of the written data may be an execution result of anoperation instruction, a load instruction, or the like, or may be dataprovided by a register in the processor 12 or another storage unit, ormay be an immediate.

A memory copy operation is generally used to copy several bytes storedin a source storage area of the memory 14 to a destination storage areaof the memory 14. A function format of a memory copy function (memcpy)is, for example,

memcpy(destin, source, n)

where destin points to a destination storage area (a storage area of thememory 14) that needs to store replicated content, source points to asource storage area (a storage area of the memory 14) where thereplicated content is located, and n indicates a quantity of bytesincluded in the replicated content.

The memory copy function may be implemented by a series of memory accessinstructions. The processor 12 may obtain the n-byte replicated contentstored in the source storage area, and execute a storage instructionstream, to write the replicated content to the destination storage area.The replicated content may be divided into a plurality of pieces ofwritten data based on bytes or words or other units, and each storageinstruction is used to write a corresponding piece of written data to acorresponding storage location in the destination storage area.

FIG. 3 illustrates a partial schematic flowchart for executing a storageinstruction according to an embodiment of the present invention. Withreference to FIG. 2 and FIG. 3, the following exemplarily describes aprocess of executing a storage instruction.

In operation 310 shown in FIG. 3, the instruction execution unit 121initiates a write access request. The write access request includeswritten data specified by a storage instruction and a physical addressto which the storage instruction is mapped.

In operation 310, the instruction execution unit 121 may first initiatean address translation request based on an address operand of thestorage instruction, where the address translation request includes avirtual address corresponding to the address operand of the storageinstruction; and then the memory management unit 122 responds to theaddress translation request, and translates, based on a correspondingentry in a page table, the virtual address in the address translationrequest into a physical address that can be used to access the memory,that is, the physical address to which the storage instruction ismapped.

The write access request may further specify some auxiliary information,where the auxiliary information may include write allocate informationused to indicate whether a write allocate policy is valid. The writeallocate information may come from global configuration informationprovided by the register file 126, or may be provided by an attributeflag of an entry that is in the page table and matches the virtualaddress of the storage instruction, or may come from other configurationinformation.

In operation 320 shown in FIG. 3, in response to the write accessrequest, the storage control unit 123 compares the physical address towhich the storage instruction is mapped, with each cache block addressin the cache, to determine whether the access comprises a cache hit orcache mess.

A method for determining whether the access is a the cache hit mayinclude: determining whether a cache block address of the cache 18matches the physical address to which the write access request ismapped; and if a cache block address of a cache block matches a physicalpage number of the physical address to which the write access request ismapped, continuing to determine, based on the physical address to whichthe write access request is mapped, whether a corresponding cache entry(cache entry, where each cache block includes a plurality of cacheentries that may be indexed by several bits of the physical address)exists in the matched cache block; where if yes, the cache 18 is hit; orif no cache block or cache entry to which the write access request ismapped exists in the cache 18, it indicates that the write accessrequest does not hit the cache 18.

If the cache is hit, operation 330 is performed: The instructionexecution unit 121 updates the corresponding cache entry based on thewritten data specified by the write access request.

If the cache is not hit, the following operation 340 is performed.

In operation 340 shown in FIG. 3, the storage control unit 123 selectsto use a write allocate policy or a no-write allocate policy to processthe write access request, to update a corresponding cache block and/or amemory block based on the written data specified by the storageinstruction.

It should be noted that, the write allocate policy itself does notaffect a function of the computer system, but only affects performanceof the computer system and the processor, for example, reduces speeds ofexecuting some storage instructions by the computer system and theprocessor.

In some embodiments, the storage control unit 123 may be implemented bysoftware and/or hardware. For example, the storage control unit 123implemented by hardware may be integrated in the processor 12 or thecache 18, or integrated with the processor 12 and/or the cache 18 in asame system-on-a-chip, or coupled to the processor and the cache inanother form.

Storage Control Unit

FIG. 4 illustrates a schematic diagram of the storage control unit 123according to an embodiment of the present invention. The storage controlunit may be a storage control apparatus implemented by hardware and/orsoftware. The storage control apparatus may be integrated in theprocessor 12 or the cache 18, or may be integrated with the processor 12and/or the cache 18 in a same system-on-a-chip, or may be a packagedchip independent of the processor 12 and the cache 18. Because thestorage control unit 123 may be configured to select one of a writeallocate policy and a no-write allocate policy, the storage control unitmay also be referred to as an allocate mode regulator (AMR).

The storage control unit 123 is configured to: receive a write accessrequest, determine whether any jump of physical addresses to whichsequentially arriving write access requests are mapped occurs, andselect one of a write allocate policy and a no-write allocate policybased on determined jump result information and a storage control logic,so that when the cache is not hit, the selected write allocate policy orno-write allocate policy is used to process a corresponding write accessrequest to update the memory 14 and/or the cache 18.

The storage control unit 123 may include an address detection unit 33, alogic control unit 34, a search unit 35, a write cache unit 32, and aread cache unit 31. However, this embodiment of this application is notlimited thereto.

In a process of performing a memory copy operation, the storage controlunit 123 receives a series of write access requests corresponding to astorage instruction stream. The search unit 35 is configured to comparea physical address to which a write access request is mapped, with eachcache block address in the cache 18, to determine whether the cache ishit.

If the cache 18 is hit, written data specified by the write accessrequest is directly written to a corresponding cache block. The cacheblock may be marked as a dirty block. In some subsequent steps, thecache block marked as the dirty block may be unified with acorresponding memory block in the memory 14, and a mark used forindicating the dirty block is removed afterward in accordance with awrite-back policy.

If the cache is not hit, the logic control unit 34 selects the writeallocate policy or the no-write allocate policy based on determined jumpresult information provided by the storage control logic and the addressdetection unit 33.

In the write allocate policy, the logic control unit 34 controls theread cache unit 31 to initiate a read request, so that the memory 14returns, in response to the read request, a required data block to theread cache unit 31. Then the read cache unit 31 writes the data block toa corresponding cache block address of the cache 18. Then the searchunit 35 may perform a search again among cache blocks in the cache.Because a cache block that matches the physical address to which thewrite access request is mapped exists in the cache, the cache is hit,and the specified written data can be directly stored in thecorresponding cache block, to modify data temporarily stored in thecache.

In the no-write allocate policy, the logic control unit 34 controls thewrite cache unit 32 to initiate a write request to the memory 14. Thewrite cache unit 32 temporarily stores the written data specified by thewrite access request and updates a corresponding memory block in thememory 14 based on the temporarily stored written data, without writingthe corresponding memory block in the memory 14 to the cache ormodifying the data stored in the cache.

The address detection unit 33 is configured to detect a physical addressto which a write access request is mapped, to determine a jump result ofwhether any jump of physical addresses to which sequentially arrivingwrite access requests are mapped occurs, and provide the determined jumpresult information. For example, when physical addresses to which twocontinuous write access requests are mapped correspond to discontinuousphysical pages, the physical addresses to which the two write accessrequests are mapped are discontinuous. In this case, the determined jumpresult information indicates that jumps of the physical addresses towhich the sequentially arriving write access requests are mapped occur.The determined jump result information may be stored in a correspondingregister, and includes, for example: whether a physical address to whicha current write access request is mapped and a physical address to whicha previously arriving write access request is mapped are continuous; aquantity of continuous jumps of the physical addresses to which thesequentially arriving write access requests are mapped; and whether nocontinuous jump of the physical addresses to which the sequentiallyarriving write access requests are mapped occurs. However, thedetermined jump result information in this embodiment of the presentdisclosure is not limited thereto.

In some embodiments, when the determined jump result informationindicates that no jump of the physical addresses to which the pluralityof sequentially arriving write access requests are mapped occurs, thelogic control unit 34 may select to use the no-write allocate policy toprocess the current write access request. In the no-write allocatepolicy, if the determined jump result information indicates that aquantity of continuous jumps of the physical addresses to which theplurality of sequentially arriving write access requests are mapped isgreater than or equal to a preset quantity y, the logic control unit 34performs restoration from the no-write allocate policy to the writeallocate policy. The preset quantity y may be set to a fixed valuegreater than 1, or may be determined based on a quantity of times thatthe foregoing memory copy function is invoked in a current memory copyoperation, or may be set or dynamically adjusted based on anotherfactor.

In some embodiments, the address detection unit 33 may temporarily storethe physical address to which the previously received write accessrequest is mapped (for example, temporarily store by using acorresponding register), and compare, in a bitwise manner, thetemporarily stored physical address of the previously received writeaccess request with the physical address to which the currently receivedwrite access request is mapped, to determine whether a jump of physicaladdresses, to which the two sequentially arriving write access requestsare mapped, occurs.

In some embodiments, the write access request may include the foregoingwrite allocate information. When the write allocate informationindicates the write allocate policy, the logic control unit 34 mayscreen write policy information of the write access request whenselecting the no-write allocate policy. If the logic control unit 34selects the write allocate policy, the logic control unit 34 does notscreen the write allocate information, and instead uses the writeallocate policy to process the write access request. The write allocateinformation is, for example, a flag stored in a corresponding register.

The storage control logic may be implemented by hardware and/orsoftware, for example, implemented by a component such as a counter,some registers in a register file, and a logic circuit.

Based on the foregoing analysis, the following exemplarily describes astorage control method implemented by a storage control logic.

Embodiment 1

According to one embodiment, a storage control logic is implemented by astate machine. The state machine may include an initial state and aprimary screening state, but this application is not limited thereto.

FIG. 5a illustrates a schematic state transition diagram of the storagecontrol logic according to this embodiment of the present invention.

Initial state: The storage control logic uses a write allocate policy.In this state, if determined jump result information indicates that nojump of physical addresses to which sequentially arriving i write accessrequests are mapped occurs, the storage control logic exits the initialstate and enters the primary screening state; otherwise, the storagecontrol logic keeps the initial state. The i write access requestsherein point to i continuous physical addresses, and the i continuousphysical addresses correspond to, for example, several bytes or dozensof bytes of data, where i is a natural number greater than or equal to1.

Primary screening state: The storage control logic uses a no-writeallocate policy. For example, this is implemented by screening writeallocate information in a write access request. In this state, if Y1continuous jumps of physical addresses to which a plurality ofsequentially arriving write access requests are mapped occur, thestorage control logic returns to the initial state; or if a quantity ofcontinuous jumps of physical addresses to which sequentially arrivingwrite access requests are mapped is less than Y1, or no jump occurs, thestorage control logic keeps the primary screening state, to use theno-write allocate policy to process a current write access request. Inthis embodiment, Y1 is equal to a preset quantity y.

As can be seen, in this embodiment, a condition for transitioning fromthe initial state to the primary screening state is “no jump of physicaladdresses to which sequentially arriving i write access requests aremapped occurs”, and a condition for transitioning from the primaryscreening state to the initial state is “Y1 continuous jumps of physicaladdresses to which sequentially arriving write access requests aremapped occur”.

As can be known from the foregoing example, in this embodiment, in anoperation such as memory copy that is used to process data of a lowaccess frequency, unnecessary switching between the write allocatepolicy and the no-write allocate policy that is caused by any jump ofphysical addresses to which write access requests are mapped is avoided,and performance and efficiency of a processor are improved.

After the memory copy operation is ended, the storage control logicrestores the write allocate policy based on a subsequent write accessrequest.

Embodiment 2

According to one embodiment, a storage control logic is implemented by astate machine. The state machine may include an initial state, a primaryscreening state, and a level 1 caching state, but this application isnot limited thereto.

FIG. 5b illustrates a schematic state transition diagram of the storagecontrol logic according to this embodiment of the present invention.

Initial state: The storage control logic uses a write allocate policy.In this state, if determined jump result information indicates that nojump of physical addresses to which sequentially arriving i write accessrequests are mapped occurs, the storage control logic exits the initialstate and enters the primary screening state; otherwise, the storagecontrol logic keeps the initial state. A definition of i herein is thesame as that in Embodiment 1, and is not described again herein.

Primary screening state: The storage control logic uses a no-writeallocate policy. For example, this is implemented by screening writeallocate information in a write access request. In this state, ifdetermined jump result information indicates that Y2 continuous jumps ofphysical addresses to which sequentially arriving write access requestsare mapped occur, the storage control logic exits the primary screeningstate and returns to the initial state; or if continuously arrivingwrite access requests are mapped to continuous physical addresses, thestorage control logic exits the primary screening state and enters thelevel 1 caching state. In other cases, no state transition occurs. Y2may be set to a fixed value greater than 1, or may be determined basedon a quantity of times that the foregoing memory copy function isinvoked in a current memory copy operation, or may be set or dynamicallyadjusted based on another factor.

Level 1 caching state: The storage control logic uses the no-writeallocate policy. In this state, if determined jump result informationindicates that a plurality of continuously arriving write accessrequests are sequentially mapped to continuous physical addresses, thestorage control logic keeps the level 1 caching state; or if determinedjump result information indicates that any jump of physical addresses towhich sequentially arriving write access requests are mapped occurs, thestorage control logic exits the 1 caching state and returns to theprimary screening state.

In this embodiment, the storage control logic in the initial state ismainly configured to process some operations other than the memory copyoperation, so that written data specified by write access requestsinitiated by the operations can be written to a cache by using the writeallocate policy, to facilitate access. Both the primary screening stateand the level 1 caching state are states using the no-write allocatepolicy, but screening strength of the primary screening state isdifferent from that of the level 1 caching state. The screening strengthherein indicates a quantity, that is, a preset quantity y, of continuousjumps of physical addresses to which sequentially arriving write accessrequests are mapped, required for returning from a current state to theinitial state.

A condition for transitioning from the primary screening state to theinitial state is “Y2 continuous jumps of physical addresses to whichsequentially arriving write access requests are mapped occur”.Therefore, the screening strength of the primary screening state isequal to Y2, that is, in the primary screening state, the presetquantity y is equal to Y2.

To return from the level 1 caching state to the initial state, thestorage control logic needs to go through the primary screening state.In addition, after returning from the level 1 caching state to theprimary screening state, if write access requests are mapped to aplurality of continuous physical addresses again, the storage controllogic may transition from the primary screening state to the level 1caching state again. Therefore, the screening strength of the level 1caching state is greater than Y2 (that is, in the level 1 caching state,the preset quantity y is greater than Y2). Therefore, unnecessaryswitching between the write allocate policy and the no-write allocatepolicy that is caused by a single jump or a few jumps of physicaladdresses can be avoided.

Based on this embodiment, preferably, a condition for transitioning fromthe initial state to the primary screening state is “no jump of physicaladdresses to which sequentially arriving i write access requests aremapped occurs”, a condition for transitioning from the primary screeningstate to the level 1 caching state is “no jump of physical addresses towhich a plurality of sequentially arriving write access requests aremapped occurs”, a condition for transitioning from the level 1 cachingstate to the primary screening state is “physical addresses to which acurrently received write access request and a previously arriving writeaccess request are mapped are discontinuous”, and a condition fortransitioning from the primary screening state to the initial state is“Y2 continuous jumps of physical addresses to which a plurality ofsequentially arriving write access requests are mapped occur”. However,this embodiment is not limited thereto. For example, alternatively, thecondition for transitioning from the primary screening state to thelevel 1 caching state may be “a currently received write access requestand a previously arriving write access request are mapped to continuousphysical addresses”.

As can be known from the foregoing example, in this embodiment, in anoperation such as memory copy that is used to process data of a lowaccess frequency, unnecessary switching between the write allocatepolicy and the no-write allocate policy that is caused by any jump ofphysical addresses to which write access requests are mapped is avoided,and performance and efficiency of a processor are improved.

After the memory copy operation is ended, the storage control logicgradually restores the write allocate policy based on a subsequent writeaccess request.

Embodiment 3

According to one embodiment, a storage control logic is implemented by astate machine. The state machine may include an initial state, a primaryscreening state, and a level 1 caching state to a level K caching state,but this application is not limited thereto, where K is a natural numbergreater than or equal to 2. Different states in the state machine maycorrespond to different state numbers. The state numbers are, forexample, stored in registers.

FIG. 5c illustrates a schematic state transition diagram of the storagecontrol logic according to this embodiment of the present invention.

Initial state: The storage control logic uses a write allocate policy.In this state, if determined jump result information indicates that nojump of physical addresses to which sequentially arriving i write accessrequests are mapped occurs, the logic control logic exits the initialstate and enters the primary screening state; otherwise, the storagecontrol logic keeps the initial state. A definition of i herein is thesame as that in Embodiment 1, and is not described again herein.

Primary screening state: The storage control logic uses a no-writeallocate policy. For example, this is implemented by screening writeallocate information in a write access request. In this state, ifdetermined jump result information indicates that Y3 continuous jumps ofphysical addresses to which sequentially arriving write access requestsare mapped occur, the storage control logic exits the primary screeningstate and returns to the initial state; or if continuously arrivingwrite access requests are mapped to continuous physical addresses, thestorage control logic exits the primary screening state and enters thelevel 1 caching state. In other cases, no state transition occurs. Y3may be set to a fixed value greater than 1, or may be determined basedon a quantity of times that the foregoing memory copy function isinvoked in a current memory copy operation, or may be set or dynamicallyadjusted based on another factor.

Level 1 caching state to level K caching state: The storage controllogic uses the no-write allocate policy.

In the 1 caching state, if determined jump result information indicatesthat any jump of physical addresses to which sequentially arriving writeaccess requests are mapped occurs, the storage control logic returns tothe primary screening state. In the level 1 caching state to the levelK−1 caching state, if determined jump result information indicates thatno jump of physical addresses to which a plurality of sequentiallyarriving write access requests are mapped occurs, the storage controllogic transitions to a lower-level caching state. In the level 2 cachingstate to the level K caching state, if determined jump resultinformation indicates that any jump of physical addresses to whichsequentially arriving write access requests are mapped occurs, thestorage control logic transitions to an upper-level caching state.

Similar to the foregoing Embodiment 2, in this embodiment, all of theprimary screening state and the level 1 caching state to the level Kcaching state are states using the no-write allocate policy. However,screening strength of the primary screening state and the level 1caching state to the level K caching state increases sequentially. Avalue of K may be determined based on a preset quantity y.

For example, to return from the level 1 caching state to the initialstate, the storage control logic needs to go through the primaryscreening state. The storage control logic can restore the writeallocate policy only when a quantity of continuous jumps of physicaladdresses to which sequentially arriving write access requests aremapped reaches Y3+1. For the level 2 caching state, to return from thelevel 2 caching state to the initial state, the storage control logicneeds to go through the primary screening state and the level 1screening state. In addition, after returning from the level 2 cachingstate to the level 1 caching state, if write access requests are mappedto a plurality of continuous physical addresses again, the storagecontrol logic may transition from the level 1 caching state to the level2 caching state again.

Therefore, screening strength of a caching state on each level increasessequentially (that is, the preset quantity y corresponding to thecaching state on each level increases sequentially). In addition, in allsuch states, unnecessary switching between the write allocate policy andthe no-write allocate policy that is caused by a single jump or a fewjumps of physical addresses can be avoided, and unnecessary switchingbetween the write allocate policy and the no-write allocate policy thatis caused by a single jump or a few jumps of discontinuous physicaladdresses can also be avoided.

Based on this embodiment, preferably, a condition for transitioning fromthe initial state to the primary screening state is “no jump of physicaladdresses to which sequentially arriving i write access requests aremapped occurs”, a condition for transitioning from the primary screeningstate to the level 1 caching state and transitioning from the cachingstate on each level to the lower-level caching state is “no jump ofphysical addresses to which a plurality of sequentially arriving writeaccess requests are mapped occurs”, a condition for returning from thelevel 1 caching state to the primary screening state is “a jump ofphysical addresses to which two sequentially arriving write accessrequests are mapped occurs”, a condition for returning from the cachingstate on each level to the upper-level caching state is “a jump ofphysical addresses to which two sequentially arriving write accessrequests are mapped occurs”, and a condition for transitioning from theprimary screening state to the initial state is “Y3 continuous jumps ofphysical addresses to which a plurality of sequentially arriving writeaccess requests are mapped occur”. However, this embodiment is notlimited thereto. For example, alternatively, the condition fortransitioning from the primary screening state to the level 1 cachingstate or transitioning from the caching state on each level to thelower-level caching state may be “a currently received write accessrequest and a previous write access request are mapped to continuousphysical addresses”.

After the memory copy operation is ended, the storage control logicrestores the write allocate policy based on a subsequent write accessrequest.

Similarly to the foregoing Embodiment 1 and Embodiment 2, in thisembodiment, in an operation such as memory copy that is used to processdata of a low access frequency, unnecessary switching between the writeallocate policy and the no-write allocate policy is reduced, andperformance and efficiency of a processor are improved.

Embodiment 4

According to another embodiment, a storage control unit 123 may furtherinclude a register configured to store a cache depth value, and thestorage control logic may implement a corresponding storage controlmethod based on the cache depth value.

FIG. 6 illustrates a schematic flowchart of a storage control methodaccording to this embodiment of the present invention.

As shown in FIG. 6, in operation 601, a write allocate policy is used inan initial phase, and a cache depth value is an initial value.

In operation 602, if determined jump result information indicates thatno jump of physical addresses to which at least two sequentiallyarriving write access requests are mapped occurs, the cache depth valueis increased based on a first preset gradient. For example, an operationof adding 1 is performed on the cache depth value.

In operation 603, if determined jump result information indicates thatphysical addresses to which at least two sequentially arriving writeaccess requests are mapped are discontinuous, the cache depth value isdecreased based on a second preset gradient. For example, an operationof subtracting 1 is performed on the cache depth value. The secondpreset gradient is, for example, equal to the first preset gradient.

In operation 604, whether the cache depth value is less than a specifiedthreshold is determined, and if yes, operation 605 is performed to use awrite allocate policy, or if no, operation 606 is performed to use ano-write allocate policy.

In some embodiments, the specified threshold is greater than or equal toa sum of the initial value and the first preset gradient (or the secondgradient). Therefore, when a single jump or a few jumps of physicaladdresses to which sequentially arriving write access requests aremapped occur, restoration from the no-write allocate policy to the writeallocate policy is not performed.

In this embodiment, in an operation such as memory copy that is used toprocess data of a low access frequency, unnecessary switching betweenthe write allocate policy and the no-write allocate policy that iscaused by any jump of physical addresses to which write access requestsare mapped can also be avoided, and performance and efficiency of aprocessor are improved.

Up to now, the storage control method, storage control apparatus, andrelated processing apparatus and computer system for selecting one ofthe write allocate policy and the no-write allocate policy based onwhether any jump of physical addresses to which sequentially arrivingwrite access requests are mapped occurs have been described by usingexamples.

In a conventional solution, if it is detected that continuous writeaccess requests are mapped to continuous physical addresses (that is, nojump occurs), a no-write allocate policy is used to respond to the writeaccess requests. If it is detected that any jump of physical addressesto which continuous write access requests are mapped occurs, a processordirectly quits a no-write allocate operation, and switches to a writeallocate operation.

However, for continuous write access requests, some data that may not berepeatedly accessed may not necessarily correspond to continuousphysical addresses. For example, in a memory copy operation process orthe like, the processor may need to jump to other addresses at regularintervals to perform memory move operations or the like; in someprocesses, data blocks that need to be continuously accessed may havecontinuous virtual addresses, but the continuous virtual addresses maybe mapped to discontinuous physical addresses. In the conventionalsolution, a write allocate operation may be used in such cases ofphysical address jumps. Therefore, efficiency of the processor isreduced, and performance of the processor is reduced.

In comparison with the conventional solution, the storage controlmethod, storage control apparatus, processing apparatus, and computersystem provided by the embodiments of the present disclosure can detectwhether a plurality of jumps of the physical addresses to which thesequentially arriving write access requests are mapped occur, and if aquantity of continuous jumps of the physical addresses to which theplurality of sequentially arriving write access requests are mapped isless than the preset quantity, keep using the no-write allocate policy,instead of storing the written data specified by the write accessrequests in the cache and/or the memory, to avoid, as much as possible,completing storage of the written data by selecting the write allocatepolicy during processing of information of a low access probability,avoid storing written data of a low access probability in the cache,improve performance and efficiency of the computer system, and enhancerobustness and stability of the processor and the computer system.

This application further discloses a computer readable storage mediumincluding computer executable instructions stored thereon. When beingexecuted by the processor, the computer executable instruction causesthe processor to execute the methods of the embodiments describedherein.

In addition, this application further discloses a system. The systemincludes an apparatus configured to implement the method according toeach embodiment in this specification.

This application further discloses a processing apparatus. Theprocessing apparatus includes the foregoing processor or processor core,or a system-on-a-chip that integrates the foregoing processor orprocessor core.

It should be appreciated that the foregoing descriptions are merelyexemplary embodiments of the present invention and are not intended tolimit the present invention. For those skilled in the art, there aremany variations for the embodiments of this specification. Anymodification, equivalent replacement, and improvement made withoutdeparting from the spirit and principle of the present invention shallfall within the protection scope of the present invention.

For example, in some embodiments, the storage control unit may includean enable register. Enabling and disabling of the storage control unitmay be set by configuring at least one numeric value in the enableregister.

It should be understood that the embodiments in this specification areall described in a progressive manner. For same or similar parts in theembodiments, mutual reference may be made, and each embodiment focuseson a difference from other embodiments. In particular, the methodembodiment is essentially similar to the method described in theapparatus embodiment and system embodiment, and therefore is describedbriefly. For related parts, reference may be made to partialdescriptions in the other embodiments.

It should be understood that specific embodiments in this specificationare described above. Other embodiments fall within the scope of theclaims. In some cases, actions or steps described in the claims may beperformed in a sequence different from those in the embodiments, andexpected results can still be achieved. In addition, illustratedspecific sequences or continuous sequences are not necessarily requiredfor the processes described in the drawings to achieve the expectedresults. In some implementations, multi-task processing and parallelprocessing are also allowed or may be advantageous.

It should be understood that a component described in a singular formherein or only one component shown in the accompanying drawings does notmean that a quantity of such components is limited to one. In addition,separate modules or components described or shown herein may be combinedinto one module or component, and one module or component described orshown herein may be split into a plurality of modules or components.

It should be further understood that the terms and expressions usedherein are used for description only, and that one or more embodimentsof this specification should not be limited to these terms andexpressions. Use of these terms and expressions does not imply exclusionof any equivalent features indicated or described (or partial featuresthereof), and it should be recognized that any possible modificationsshould also fall within the scope of the claims. Other modifications,changes, and replacements may also exist. Correspondingly, the claimsshall be considered to cover all these equivalents.

What is claimed is:
 1. A storage control method, comprising: detectingwhether any jump of physical addresses to which a plurality ofsequentially arriving write access requests are mapped occurs; andselecting one of a write allocate policy and a no-write allocate policyto process the plurality of sequentially arriving write access requests,wherein the storage control method further comprises: selectinginitially the write allocate policy if a cache is not hit and when aquantity of continuous jumps of the physical addresses to which theplurality of sequentially arriving write access requests are mapped isgreater than or equal to a preset quantity, and switching to theno-write allocate policy to process the plurality of sequentiallyarriving write access requests if the cache is not hit and no jump ofthe physical addresses to which the plurality of sequentially arrivingwrite access requests are mapped occurs, wherein, in the no-writeallocate policy, if the quantity of continuous jumps of the physicaladdresses to which the plurality of sequentially arriving write accessrequests are mapped is less than the preset quantity, continuing usingthe no-write allocate policy instead of selecting a write allocatepolicy, and wherein the preset quantity is greater than
 1. 2. Thestorage control method according to claim 1, wherein the preset quantityis set to a fixed value, or is determined based on a quantity of timesthat a memory access function is invoked, and wherein the memory accessfunction is implemented by at least one of the plurality of sequentiallyarriving write access requests.
 3. The storage control method accordingto claim 1, wherein the write access request comprises: a physicaladdress to which a storage instruction is mapped; and written dataspecified by the storage instruction, wherein in the no-write allocatepolicy, the written data is written to a memory and is not written tothe cache.
 4. The storage control method according to claim 3, furthercomprising: in the no-write allocate policy, if the quantity ofcontinuous jumps of the physical addresses to which the plurality ofsequentially arriving write access requests are mapped is greater thanor equal to the preset quantity, using the write allocate policy,wherein in the write allocate policy, the written data is written to thecache.
 5. The storage control method according to claim 4, wherein thestorage control method further comprises: in an initial state, selectingto use the write allocate policy, and if no jump of the physicaladdresses to which the plurality of sequentially arriving write accessrequests are mapped occurs, exiting the initial state, and entering aprimary screening state; and in the primary screening state, using theno-write allocate policy, and if the quantity of continuous jumps of thephysical addresses to which the plurality of sequentially arriving writeaccess requests are mapped is equal to the preset quantity, returning tothe initial state.
 6. The storage control method according to claim 5,wherein the storage control method further comprises: in the primaryscreening state, if no jump of the physical addresses to which theplurality of sequentially arriving write access requests are mappedoccurs, entering a level 1 caching state; and in the level 1 cachingstate, if any jump of the physical addresses to which the plurality ofsequentially arriving write access requests are mapped occurs, returningto the primary screening state.
 7. The storage control method accordingto claim 6, wherein the storage control method further comprises: in thelevel 1 caching state to a level K−1 caching state, if no jump of thephysical addresses to which the plurality of sequentially arriving writeaccess requests are mapped occurs, transitioning to a lower-levelcaching state; and in a level 2 caching state to a level K cachingstate, if any jump of the physical addresses to which the plurality ofsequentially arriving write access requests are mapped occurs,transitioning to an upper-level caching state, wherein K is a naturalnumber greater than or equal to
 2. 8. The storage control methodaccording to claim 4, wherein the storage control method furthercomprises: in an initial phase, using the write allocate policy, andresetting a cache depth value to an initial value; if no jump of thephysical addresses to which the plurality of sequentially arriving writeaccess requests are mapped occurs, increasing the cache depth valuebased on a first preset gradient; if any jump of the physical addressesto which the plurality of sequentially arriving write access requestsare mapped occurs, decreasing the cache depth value based on a secondpreset gradient; and when the cache depth value is less than a specifiedthreshold, selecting to use the write allocate policy, or when the cachedepth value is greater than or equal to the specified threshold,selecting to use the no-write allocate policy.
 9. The storage controlmethod according to claim 8, wherein the specified threshold is greaterthan or equal to a sum of the initial value and the first presetgradient.
 10. The storage control method according to claim 4, whereinthe write access request further comprises write policy information, andwherein the write policy information indicates one of the write allocatepolicy and the no-write allocate policy; and by screening the writepolicy information of the write access request, use the no-writeallocate policy; or use the write allocate policy based on the writepolicy information of the write access request.
 11. The storage controlmethod according to claim 10, further comprising: obtaining an entrythat matches a virtual address specified by the storage instruction;translating, based on an identifier of the entry and the written data,the virtual address specified by the storage instruction into thephysical address to which the storage instruction is mapped; andproviding the write policy information based on an attribute flag of theentry.
 12. The storage control method according to claim 10, wherein thewrite policy information is provided by a global register.
 13. Thestorage control method according to claim 4, wherein the storage controlmethod further comprises: initiating a read request to the memory in thewrite allocate policy, and storing a data block returned by the memoryin the cache, so that the data block is modified based on the writtendata.
 14. The storage control method according to claim 4, wherein thestorage control method further comprises: initiating a read request tothe memory in the no-write allocate policy, so that a corresponding datablock in the memory is modified based on the written data.
 15. A storagecontrol apparatus, comprising: an address detection unit implemented ina processor, and adapted to detect whether any jump of physicaladdresses to which a plurality of sequentially arriving write accessrequests are mapped occurs; and a logic control unit implemented in theprocessor, coupled to the address detection unit, and adapted to selectone of a write allocate policy and a no-write allocate policy to processthe plurality of sequentially arriving write access requests, whereinthe logic control unit is adapted to: select initially the writeallocate policy if a cache is not hit and when a quantity of continuousjumps of the physical addresses to which the plurality of sequentiallyarriving write access requests are mapped is greater than or equal to apreset quantity; and switch to the no-write allocate policy to processthe plurality of sequentially arriving write access requests if thecache is not hit and no jump of the physical addresses to which theplurality of sequentially arriving write access requests are mappedoccurs, wherein, in the no-write allocate policy, if the quantity ofcontinuous jumps of the physical addresses to which the plurality ofsequentially arriving write access requests are mapped is less than thepreset quantity, the logic control unit keeps using the no-writeallocate policy instead of selecting the write allocate policy, andwherein the preset quantity is greater than
 1. 16. The storage controlapparatus according to claim 15, wherein the write access requestcomprises: a physical address to which a storage instruction is mapped;and written data specified by the storage instruction, wherein in theno-write allocate policy, the written data is written to a memory and isnot written to the cache.
 17. The storage control apparatus according toclaim 16, wherein the logic control unit implemented in the processor isfurther adapted to: in the no-write allocate policy, if the quantity ofcontinuous jumps of the physical addresses to which the plurality ofsequentially arriving write access requests are mapped is greater thanor equal to the preset quantity, select the write allocate policy,wherein in the write allocate policy, the written data is written to thecache.
 18. The storage control apparatus according to claim 17, whereinthe logic control unit implemented in the processor is further adaptedto: in an initial state, select the write allocate policy, and if nojump of the physical addresses to which the plurality of sequentiallyarriving write access requests are mapped occurs, exit the initialstate, and enter a primary screening state; and in the primary screeningstate, select the no-write allocate policy, and if the quantity ofcontinuous jumps of the physical addresses to which the plurality ofsequentially arriving write access requests are mapped is equal to thepreset quantity, return to the initial state.
 19. The storage controlapparatus according to claim 18, wherein the logic control unitimplemented in the processor is further adapted to: in the primaryscreening state, if no jump of the physical addresses to which theplurality of sequentially arriving write access requests are mappedoccurs, enter a level 1 caching state; and in the level 1 caching state,if any jump of the physical addresses to which the plurality ofsequentially arriving write access requests are mapped occurs, return tothe primary screening state.
 20. The storage control apparatus accordingto claim 19, wherein the logic control unit implemented in the processoris further adapted to: in the level 1 caching state to a level K−1caching state, if no jump of the physical addresses to which theplurality of sequentially arriving write access requests are mappedoccurs, transition to a lower-level caching state; and in a level 2caching state to a level K caching state, if any jump of the physicaladdresses to which the plurality of sequentially arriving write accessrequests are mapped occurs, transition to an upper-level caching state,wherein K is a natural number greater than or equal to
 2. 21. Thestorage control apparatus according to claim 17, further comprising aregister configured to store a cache depth value, wherein the logiccontrol unit implemented in the processor is further adapted to: in aninitial phase, select the write allocate policy, and reset the cachedepth value to an initial value; if the plurality of sequentiallyarriving write access requests are sequentially mapped to continuousphysical addresses, increase the cache depth value based on a firstpreset gradient; if any jump of the physical addresses to which theplurality of sequentially arriving write access requests are mappedoccurs, decrease the cache depth value based on a second presetgradient; and when the cache depth value is less than a specifiedthreshold, select the write allocate policy, or when the cache depthvalue is greater than or equal to the specified threshold, select theno-write allocate policy.
 22. The storage control apparatus according toclaim 21, wherein the specified threshold is greater than or equal to asum of the initial value and the first preset gradient.
 23. The storagecontrol apparatus according to claim 17, wherein the write accessrequest further comprises write policy information, wherein the writepolicy information indicates one of the write allocate policy and theno-write allocate policy; and wherein the logic control unit implementedin the processor is configured to perform the following: screening thewrite policy information of the write access request, to select theno-write allocate policy; or using the write allocate policy based onthe write policy information of the write access request.
 24. Aprocessing apparatus, wherein the processing apparatus is a processor, aprocessor core, or a system-on-a-chip, and comprises: the storagecontrol apparatus according to claim 23; an instruction execution unitimplemented in a cache, and adapted to provide the write access requeston the storage instruction; and a hardware register, adapted to providethe write policy information as a global register.
 25. The processingapparatus according to claim 24, further comprising: a memory managementunit implemented in the cache, coupled to the hardware register, andadapted to provide an entry that matches a virtual address specified bythe storage instruction, to translate the virtual address based on theentry into the physical address to which the storage instruction ismapped and provide the write policy information to the instructionexecution unit.
 26. The storage control apparatus according to claim 17,further comprising: a read cache unit implemented in the cache, andadapted to initiate a read request to the memory in the write allocatepolicy, and store a data block returned by the memory in the cache, sothat the data block is modified based on the written data.
 27. Thestorage control apparatus according to claim 17, further comprising: awrite cache unit implemented in the cache, and adapted to initiate awrite request to the memory in the no-write allocate policy, so that acorresponding data block in the memory is modified based on the writtendata.
 28. The storage control apparatus according to claim 15, whereinthe preset quantity is set to a fixed value, or is determined based on aquantity of times that a memory access function is invoked, and whereinthe memory access function is implemented by at least one of theplurality of sequentially arriving write access requests.
 29. Aprocessing apparatus, wherein the processing apparatus is a processor, aprocessor core, or a system-on-a-chip, and comprises the storage controlapparatus according to claim
 15. 30. A computer system, comprising: theprocessing apparatus according to claim 29; a cache, coupled to thestorage control apparatus; and a memory, coupled to the storage controlapparatus.
 31. The computer system according to claim 30, wherein thecomputer system is implemented by a system-on-a-chip.